Hierarchic Estimation

This chapter discusses hierarchic estimation. Topics include:

Introduction to hierarchic estimation

This section provides an overview of hierarchic estimation. Topics include:

Approaches to estimating very large matrices

There are formidable data processing and computational issues to be faced when estimating very large matrices, whose size may lie in the range of 2,500 to 10,000 zones for major transport studies. Theoretically, the matrices can have between 25002 and 100002 (6,250,000 to 100,000,000) cells to estimate, although the practical number of cells with non-zero trips will only be a fraction of this. Nevertheless, the number of cells to be estimated in typical applications will be of the order of 250,000 to 750,000 cells.

The natural approach, which is used in hierarchic matrix estimation, is to reduce the estimation problem to a more manageable size by grouping information. However, it is necessary to recognize that the pattern of trips across many large study areas, such as conurbations, is not readily partitioned. For example, a data item such as a flow count or a trip end may relate to trips with dispersed origins and destinations which may not easily be grouped.

It is therefore a feature of CUBE Analyst hierarchic estimation that each of the different approaches to estimation offered, and which are described below, always considers all of the trips in the entire study area.

Different levels of detail: Districts and zones

The approaches offered by CUBE Analyst hierarchic estimation considers the OD matrix at two levels of detail:

Fine level, which is the original zoning system and results in a zonal matrix
Coarser level, which aggregates (groups) sets of zones into a limited number of districts, from which a corresponding district matrix may be produced

The total number of trips in the zonal and district matrices is the same.

Different approaches to hierarchic estimation

The main method is called hierarchic estimation as the estimated district matrix is used to control a series of estimations primarily conducted at the zonal level. This process leads to a fully updated zonal matrix.

Hierarchic estimation also allows a variant method in which the district matrix is defined as a mixture of district and zonal detail. The resulting district matrix which is estimated includes some cells estimated at the zonal level. The output estimated matrix has fewer rows and columns than the input matrix, but there will be a direct correspondence between certain of the cells as selected by the user. This variant is valuable when it is only necessary for the application to update cells relating to only parts of the large study area, for example, to update cells for an administrative borough within a large city region. The method only requires a single estimation, rather than the series of estimations used in the main hierarchic estimation process. This hierarchic estimation variant is referred to as combined district and zonal estimation.

The underlying estimation process is common to all CUBE Analyst runs but there are differences in how information is grouped in hierarchic estimation. Apart from differences in information grouping, the combined district and zonal estimation is very similar to a standard estimation. The hierarchic estimation method introduces a new concept, which is called a local matrix. This is explained in Local matrices.

Alternative approaches to hierarchic estimation

This section describes alternative approaches to hierarchic estimation. Topics include:

Estimation with mixed district and zonal detail

The majority of this section is concerned with hierarchic estimation, but it begins with a view of the approach for combined district and zonal estimation, shown in the figure Combined estimation of selected zones and districts. This shows the estimated matrix where the sides of the cells have been scaled according to the geographical size of the areas to which they relate. That is, the large sides correspond to districts and the small sides to zones. This has resulted in three types of cells:

Large squares — All information is estimated at district level
Small squares — All information is estimated at zonal level
Rectangles — Information is estimated at a mixture of district and zonal detail

Combined estimation of selected zones and districts

The user may choose whether to retain information at mixed levels of details, as shown, or (manually) to extract the cells fully estimated at zonal detail (the small squares the figure) to update a portion of the zonal prior matrix.

As shown in the figure, the detailed estimation has been for trips traveling from one part of the study area to another; if the small squares were located on the diagonal of the main square shown, then the detailed estimation would be for all trips within, and traveling to and from, a particular part of the study area, such as a town center area.

Some points to note about this approach are:

Although the terms zonal and district have been used to indicate different levels of detail, CUBE Analyst considers this form of estimation as a special form of district estimation, without recognizing that a selected number of districts are simply individual zones.
There must be the same number of origin and destination districts, which is not the case for hierarchic estimation.
this approach requires a single estimation.

Local matrices

When using hierarchic estimation, CUBE Analyst first estimates a district matrix, which is used to influence the calculation of a set of local matrices. These local matrices contain a mixture of zonal detail and district-based information. The estimated zonal detail is captured automatically by CUBE Analyst and, as each local matrix is estimated, is used to develop progressively an update of the entire matrix at the zonal level of detail. The district matrix simply represents the zonal matrix aggregated into a district matrix, although the district matrix may be non-square, that is, there may be a different number of origin and destination districts. Further information about districts is given later in this section.

Consider a local matrix that is an extension of the combined district and zonal matrix shown and discussed in Estimation with mixed district and zonal detail.

Zonal estimation controlled by district matrix

In this diagram all of the large squares, where information is only estimated at district level, have been shaded. This is because this portion of the matrix is treated in a local matrix as a single unit, termed Rest-of (the)-World — RoW.

A local matrix, therefore, has the following elements:

Detailed zonal level set of cells (the small squares)
Trips in the Rest-of-World (shaded area)
Trips from RoW to zonal level area (rectangular cells)
Trips to RoW from zonal level area (rectangular cells)

A local matrix is defined for each origin and destination district pair (the unshaded part in the figure represents one such pair), and the fully estimated (zonal) matrix is produced when all local matrices have been estimated.

Information involving trips from the RoW is obtained from the district matrix. This element, and the fact that the total number of trips is the same (in principle) for each local matrix, ensures that consistency is maintained across the entire study area, even though detail is calculated separately in estimations for different parts.

Summary of the hierarchic estimation process

The hierarchic estimation process may be summarized in four stages:

Creation of districts from zones

The following figure shows a study area divided into many (small) zones (denoted by ij). These are grouped into a number of fewer (and larger) districts (denoted by IJ). Subsequent topics in this chapter give more information about creating districts.

Districts (I,J) and zones (i,j)

Estimate district matrix

This is the first operation by CUBE Analyst, which estimates a small matrix for the 5 to 15 origin and destination districts which are typically defined.

One of the cells, corresponding to a pair of origin and destination districts, which contribute to a local matrix, is referenced as Mij. The figure Estimate district matrix indicates the information in the district matrix estimation: the prior matrix and trip ends are automatically aggregated from the user’s input zonal-level information. Internally, CUBE Analyst creates a condensed network but does not aggregate the screenline count data. This treatment of data is reflected in CUBE Analyst’s reports on the district matrix (see Figure 7.12b).

Estimate district matrix

Estimate local matrices

CUBE Analyst can estimate all Local matrices in one run, but the user may exercise considerable control over this process.

This example relates to a single Local matrix, but this stage is repeated for all Local matrices. The example considers the same structural elements introduced in the discussion on Zonal estimation controlled by district matrix. The information used to estimate Zonal cells, referenced as Mij, includes:

Prior matrix and trip ends are used at zonal level in the estimation
Count data is used as input where relevant to the local matrix.
Other items are obtained from the corresponding district matrix estimation.

This use of information is reflected in CUBE Analyst’s reports on local matrices (see Figures 7.12c and 7.12d).

Estimate local matrix

Build-up full estimated matrix

This example indicates the construction of the fully estimated matrix from detailed information (Mij) calculated from a set of local matrices. When the matrix is in the form shown in the figure (with only some of the cells estimated), it is referred to as the partially estimated matrix. Those cells in the partially estimated matrix which have not yet been estimated contain copies of the corresponding prior matrix cells.

(This can provide another means of estimating just part of a study area, namely, by restricting the estimation to selected districts/zones of interest.)

When all cells of the partially estimated matrix have been estimated, it, of course, becomes the final fully estimated zonal matrix.

Combine local matrices in partially estimated matrix

Defining districts

Hierarchic estimation is a heuristic method which approximates the formal mathematical methodology provided by a standard run of CUBE Analyst. It is most appropriate when the study area is large enough to encompass sub-areas which can become districts where the travel patterns are reasonably independent of one another.

The purpose of the estimated district matrix is, largely, to consider the inter-district movements, while the focus of local matrices is the intra-district movements. Because precision (greater detail) is associated with the latter, it is desirable to minimize the amount of inter-district movements.

The number of local matrices is approximately the square of the number of districts. It therefore can make a considerable difference to computational times whether, say, 10 districts are chosen (about 100 local matrix estimations) or 8 districts (about 64 local matrix estimations).

Not all study area zones may be allocated to districts in this way, either because some or all trips from or to a zone do not pass through a screenline, or because allocation of the zone to a district would violate the maximum number of zones per district. Zones are then allocated to the adjacent district, based on the coordinates associated with zone centroids. The effect of allocating zones to district which is not based on routing behavior is potentially to worsen the effects of the approximation implicit in hierarchic estimation. In many cases, this worsening may be negligible in practice, but will be more significant if those zones involve relatively large numbers of trips, or if a significant proportion of zones are involved. It is this latter consideration which makes it inadvisable to use hierarchic estimation on study areas with less than 500 zones.

The considerations involved in defining districts may be summarized as:

The fewer districts the better
The maximum local matrix size is determined by the maximum size of standard estimation that may be conveniently run on the available computer (say 1000 - 2500 zones)
The more allocation of zones to districts on the basis of routings through screenlines the better

Note that it is a feature of hierarchic estimation districts that there may be a different number of origin and destination districts (that is, the district matrix may be non-square), and the allocation of origin zones to origin districts is independent of the allocation of that same zone to a destination district. This enables the asymmetries of trip patterns to be reflected, as, for example, in a morning peak matrix when trips originate from many zones in the suburbs and head for only a few destination zones in the city center. This is of value to the estimation process, but means that the district matrix and the local matrices cannot be reported directly.

Running CUBE Analyst for hierarchic estimation

CUBE Analyst is run in a similar manner to non-hierarchic estimation except that:

Option DSTRCT=T, to indicate calculation/use of a district matrix
LMC and DDF files are input additionally
Parameter ZCONF is set

If CUBE Analyst is run with an incomplete LMC file, then the estimated matrix is a partially estimated matrix. This matrix provides an additional input file when further local matrices are to be estimated.

The model parameter file only ever contains information relating to the district matrix (and not any local matrices), and the execution log file contains brief summary information for both district and local matrix estimations.

The printout file for hierarchic estimation contains the same type of information as for non-hierarchic estimation, as illustrated in Estimating the matrix. However, there may be many sets of this information: the first set of information always refers to the district matrix estimation. This is followed by a set of information for each local matrix being estimated, noting that this may be none in the case of a combined district and zone estimation. (Because estimations involving many local matrices can generate very large print files, it can be convenient to edit the local matrix control file to create a series of runs of CUBE Analyst in which the size of individual print files is reduced.)

An additional item of information is provided for hierarchic estimation concerning the influence of the district matrix on each local matrix estimation. The table with this information, shown in Figure 7.12c, is labeled Side constraints on matrix totals. This term refers to the constraints of the district matrix on various sides (and elements) of the local matrix, as illustrated previously in Estimate local matrix. Reporting Hierarchic Estimation Results, discusses the printout for hierarchic estimation.

Parameter ZCONF

The extent of the constraining effect of the district matrix on the local matrices is determined by CUBE Analyst parameter ZCONF, which acts as a confidence level, treating the district matrix as observed data and the local matrix as estimated. For the local matrix estimation, therefore, the district matrix is just another item of observed data and ZCONF should be set in relation to confidence levels for other items of observed data.

From the user’s point of view, the setting of ZCONF should be a reflection of the degree and importance of the interaction between districts, in terms of trips which cross more than one origin or destination district boundary. (An effect of the automatic generation of districts is to minimize such boundary crossings.) The district matrix contains information about these interactions; if they are important then the district matrix should be made correspondingly significant with a relatively high setting of ZCONF. A low value of ZCONF allows local matrices to reflect local data more precisely, at the expense of the larger picture across the entire study area. A possible symptom of an inappropriate setting of ZCONF might be an unwarranted distortion of the distribution of trip costs/lengths in the estimated matrix.